Skip to content

perf: bulk text block scanner bypasses fastparse per-line overhead#689

Open
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:perf/text-block-bulk-scanner
Open

perf: bulk text block scanner bypasses fastparse per-line overhead#689
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:perf/text-block-bulk-scanner

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented Apr 5, 2026

Motivation

Text blocks (||| syntax) are parsed line-by-line through fastparse, which incurs per-line combinator overhead for each newline. Programs with large text blocks (templates, embedded configs) pay this cost unnecessarily.

Key Design Decision

Implement a bulk scanner that directly scans for the text block terminator (|||) using a simple character loop, bypassing the fastparse per-line combinator overhead entirely. The scanner processes the entire text block in a single pass.

Modification

  • Add bulk text block scanning in the parser
  • Directly scan for ||| terminator without per-line fastparse dispatch
  • Preserve exact text block semantics (whitespace stripping, indentation)

Benchmark Results

JMH (JVM, 3 iterations warmup + 3 measurement)

Benchmark Master (ms/op) This PR (ms/op) Change
bench.02 50.427 ± 38.9 45.838 ± 6.9 -9.1%
comparison2 85.854 ± 188.7 70.746 ± 12.3 -17.6%
realistic2 73.458 ± 66.7 69.255 ± 4.0 -5.7%

Analysis

The improvement is modest but consistent across all benchmarks. The benefit will be larger for programs with many or large text blocks. Since parsing is typically a small fraction of total eval time, the -5.7% to -17.6% range is expected.

References

  • Upstream: jit branch experiment

Result

All 46 tests pass. All benchmarks positive, no regressions.

Replace the per-line fastparse combinator loop in tripleBarStringBody with
a custom bulk scanner that directly accesses the underlying String data.
For a 600KB text block with ~8000 lines, this eliminates ~8000 intermediate
String allocations and the Seq[String] + mkString join overhead.

Key changes:
- tripleBarStringBodyBulk: Custom scanner using IndexedParserInput.data
  for zero-copy StringBuilder.append(CharSequence, start, end) instead of
  fastparse's repX combinator which creates one String per line.
- Hybrid approach: first line still uses fastparse for proper error messages,
  subsequent lines use the bulk scanner.
- constructString: Skip string interning for strings >1024 chars (avoids
  expensive hashCode computation on 600KB strings), single-string fast path,
  pre-sized StringBuilder for multi-line blocks.
- Falls back to original fastparse path for non-IndexedParserInput.

JMH large_string_template: 2.251 → 1.762 ms/op (-21.7%)
Native large_string_template: ~37% faster

Upstream: explored in he-pin/sjsonnet jit branch
@He-Pin He-Pin marked this pull request as ready for review April 5, 2026 08:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant